Learning functions across many orders of magnitudes
نویسندگان
چکیده
Most learning algorithms are not invariant to the scale of the function that is being approximated. We propose to adaptively normalize the targets used in learning. This is useful in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari games, where the rewards were all clipped to a predetermined range. This clipping facilitates learning across many different games with a single learning algorithm, but a clipped reward function can result in qualitatively different behavior. Using the adaptive normalization we can remove this domain-specific heuristic without diminishing overall performance.
منابع مشابه
The behavior of the reliability functions and stochastic orders in family of the Kumaraswamy-G distributions
The Kumaraswamy distribution is a two-parameter distribution on the interval (0,1) that is very similar to beta distribution. This distribution is applicable to many natural phenomena whose outcomes have lower and upper bounds, such as the proportion of people from society who consume certain products in a given interval. In this paper, we introduce the family of Kumaraswamy-G distribution, an...
متن کاملAn Approach to Qualitative Radial Basis Function Networks over Orders of Magnitude
This paper lies within the domain of supervised learning algorithms based on neural networks whose architecture corresponds to radial basis functions. A methodology to use RBF when the descriptors of the patterns are given by means of their orders of magnitude is developed. A qualitative distance is constructed over the discrete structure of absolute orders of magnitude spaces. This distance is...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملNeural Episodic Control
Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabu...
متن کاملReinforcement Learning for Games: Failures and Successes CMA-ES and TDL in comparision
We apply CMA-ES, an evolution strategy with covariance matrix adaptation, and TDL (Temporal Difference Learning) to reinforcement learning tasks. In both cases these algorithms seek to optimize a neural network which provides the policy for playing a simple game (TicTacToe). Our contribution is to study the effect of varying learning conditions on learning speed and quality. Certain initial fai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1602.07714 شماره
صفحات -
تاریخ انتشار 2016